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A Method for Discovering Knowledge 
Using Siqiport Yectoi Machines* 



Inventor > Step f^<^ P> BamhiU. 



Technical Field: 

The present krvcniioD rehies to methods for extncting desired data ficom databases. More 
psttticularly. the present mveniiaii rchtcs to a method for eroacikjg desired data firom databases 
fomgenciatcdafldcdlcccedsetsofd^ iriating to hunEms, anmials, vkuses, 

bacteria, as well as accounting data, stock and commodity market data, and ksunnce data, in order 
to effectively classify subgroups by virtue of a con5)utaiional index (CoafraDex^ The prestAt 
indention creates an effective tnethod for nxulti-dimensiooal function estimation and resulting 
Con^pwlter^thai can be applied to a wide range of problems including pattern rccogpitioix 
fiinciion approximation, regtcssion esiimauon, molecular patteminfe proportionaliiy estinuiions and 
signal ptocessfo^ 

The present invention further relates to a con^iuter assisted method for classifying subgiouptf 
utilizing pre^wocessed inrcZKnara'^ (Intelligent Daa created by pre-processing tedmiques utilized 
specifically as part of the present invention). The /nidBDm^ is then aitcied 
Suppon Vector Machines which generates an optimal hypetplane algprithm. This optimal 
hypeiplane algociAm is dien converted by one or vaon post-processing steps into a CompuDex" 
(a sin^ vahicd computationally derived muneiical classifier) bit interpretation by a human. Iti 
summary, the present invention begins widi taw data and using support veaor machines then 
concludes wii a sin^e vahied coo^iutationany derived numerical dasafier ready for humaia 

In Ae ptefcced anbodiment of the ptesem bvcntion, the tnechod is used to dassify iiKUviduJ 
sotHOUps, based on pattern iccogoilioo techniques, boa. aay combinatioa of aw dala. Eatamples 
of the uscfdncss of dus procedure couU be demonstottd in 1.) geiK^ 
pioiea spedfically, 2.) diagnostics. 3.) evalmdon of aaaaged care efSdeucy. 4.) thwapeubc 
decisions and follow op, 5.) appiopciate dietapeuiic txiage, 6.) phamaceutial developmew 
tedmiques, 7.) discovery of tnolecular stcuctutes. &.) ptognostic evahudons, 9.) tnedical infomu^ 
10.) fcttjd, 11.) invcntMy control, and 12.) stock cvahiatiofis and ptedidjons, 13-) CQimnodTy 
ev^tions and piediction% and 14.) iosuaace probability esunatis. 

k anodiei pafeacd embodiment, Ac invcnrion inchides a ^stcm to lecciTe daa from a temoie d^a 
txansmitting station for processing through the iovenaon and tonsmit the results to the same or 
some odwr temoie dab receiving siMiott. 



BagfcfFtound nf i<m» Inventioii! 

Knowledge discovery is the most desitaWc eod-pioduct of dato collection. The last decade has 
bnn^ forward an explosive growth in ow capabiBiies to bodi generate, coDect, and store v^ 
aiDOonts of data. While database technology has provided die basic tools for die efficient stoiagp 
and coBecdoa of laigc dm sets, die issue of how to hdp humans understand and analyze hqp 
bodies of data retrains a difficult and unsolved problem.' In order to deal wididiis data gK 
geoention of tntdligeat tools for aniomated knowledge is needed.^ 



For etuofHty thcce aie huge scieoufic databases such as in the Humaa Genome Ptojca which 
bdudegig^tiTCesof dataondichusBng^ Such volumes of 

data cleady ovefvhdm die todiuooal nanual tnediods of data analysis, sudi as spread sheds and ad* 
hoc quenes. Ihose methods can cxcatc infiormative n^oos fcom data» but do not have the capacity 
CO discover the Imowledge comamed in the data. A sigiiificant need exists for a oev gjeoetanon 
techniques and tools with the ability to intdligendy and autoniaiica% 
mountains of dataand finding patterns of useful knowledge-^ 

Likewise, using traditionally accepted reference ranges and standards for intcipretatioQ, it is often 
impossible for humans to identify patterns of usefol knowledge even with very small amounts of 
data. 

This invention in part udltzes Support Vector Machines The Sif port Vector Machine tmplemenb 
the foUoving idea: it maps lbeii4)UCvectois into h^ dimension^ featnte space duxni^ non-fincsr 
tnq>ping» chosen j^f/m In this hi^ dimpn^innal featme space, an cytimal scpantiog hypetplane is 
conhHucted. This Optimal Hypetplane Classifier Algoiidun sepaiztes die various classes of interests. 

The dimensionaUy of the fieatuce space will be will he hugt Forexaniple, to construct a potynonq^l 
of d^cee 4 or 5 in a 200 dimensional space it is necessaiy to construa hypetplanes in a biUion 
^^fwwtttAtial feature ^ace. This curse of dunensionahty can be sdved by constructing the Optimal 
hypetplane- If it happens that in the hi^ dimensional input ^nce one can constnict a separatix^ 
hypetplane wxdi a snttD value, the VC dimension of the cocesponding element of the stcucture w^ll 
be small, and ifaeiefote dae genexali^tion ability of the constructed hyperpkne wiU be fai^. 

If the training vectofs aic scpanted by the Optxnal hypetplane (or genexahzed Opdmal hypetplane), 
ihm% the espectanon value of the probability of committing an error on a test exaflople is bounded }^ 
the exatnples in die ocainii^ set 

This hound depends tieidier on the dimensionality of die spaa» on the noim of the veaor of 
coecficicnts^ nor on the hound of die noon of the input vectors. Therefore, if the Opttfxsl 
hypetplane can be constructed from a smaU txumber of suppon vectors telatxve to die training set 

The problems widi Back-Propaguion Neural Network approach such as: 

1. ) The empirical risk fucnooal has many local ^ritrnm Standard optmnzation proceedures 

guarantee conveig^ce to one of them The quality of the obtained sohidon depends on 
many feccors, in particular on die initialiiatxon of the wd^ot matrtffSi 

2. ) The convergence of die gradient based tnediod is tadierdow. 

30 The si^noid function has a scaling fiic«>r which affects die quality of xh 
3pprozimanon« 

prevent nemal netwoiks feom being well controlled learning machines. 

These shortcomings of neural networks ate ovetcome using Support Vector Machines by 
constiuciing the Optimal hypeiphnc. Support Vector Machines are described in detail in 7V 
Nattm of Stadsdcal f ^^/gy TZeorp by Vhdimir Vapnik and arc incoqxxatcd herein by 
reference in tiicir entirety. 



Summary of I nventioii; 



The present invention is an apparatus and a pnxess &c dassifytng sal^geoivs, based on patton 
reco^iilion techniques^ udH:^ latdHDam^ (btelligent Data acatcd hj pre-ptocesstt^ 
tfichmques utilized specifically as part of die present invention). The LucWDsaa^ is then entered 
into one or mocc Support Vector Machines which generates an opdmal hypetpknc algondm Ibis 
optimal hypcrplane ^gpridun is dicn converted by one or more post-piocesstng steps into a 
Coo^uDck"' (a sixi^ valued con^tationalfy derived numerical classi£er) for inteipietaiion by 3 

Genenlly, this objeoxve is accomplished by perfottning die following steps: 

1. G>Ilect data in it^s original axni/or natuol fenn. 

2. Optionally appty expert medical pre-processing techniques to derive MediData^*l 
(Data derived from applying expert medical information to raw data). 

3. Optionally apply maAemarical (computatioDal) pre-processing techniques to derive 
CowpuDaiA™. QDao derived from applying mathematical (computational infbrmaton to 
lawdata). 

4. (>oibinedieresdtsofJlf«fiDto™^ 

5. thp rrpatfgd Inn!M}atA™2% iiyutintp one or more support vector machines^ 

6. Genera^ an optonal hypecplaae classifier algodttmt 

1 Appty mathematical post-processing tcdiniqucs to create a 



Pee pEDCostng 



iUwData 




(Expert Medical 
Pie-proccsscdDfioa) 



(Ifitdligeot 
Data) 



(CorrtjufflrionilJ COTDpuD^t^P^ . 

(CoffipuutionOr 



Si^ 2- OetciiDiiiatioo of the Optimal Hypcipla&c Oassificr Algoridim 

j^^u^y^ ^Support Vector ^ Optin^Hjpcrplaoc 

JtadbDam^ » Machine ^ Classifier Ateoiithro 

(TnidivntDMA) DetBonination 



Stq> 3 ' GeadoD of a Matfaemaiical CompuDex™ (Computaiiooai Index) 
for Human bittipeefiuion 

cLifier/jPiun ► Pos^p.ocessl^g ^ (Compuanonal 



Mott detul pexfooning die steps in the invention is as follows: 
L CoDect data in i^sorigiiial and/or mtivatfim. 

This initial stq) involves geneating and collecting any ^cn set of data that may 
contain infbnnaiion which is no t immedatdy appatent and needs to be evahiated to 
idencify any panems of usefol knowledge. 

2. Optionally apply expert medical preiMocessbg tecfamq^ to dame 

This siqi actually creates an additional new set of input data ^npnt vectors)* 

Uris next step in die invention involves die option of appHadon of expm medical 
pre-pnxesstog techniques to die ixv dan to create an additional set of input dala 
known as JtfeiaCDaca'^. Exanqdcs of expert medical pre-processing steps indude 

but arc not limited to die following: 

A. Assoctationwxdi known standard reference tanges. 

B. Physiologic Truncation 

C. Physiologic Combinations 

0. Biochemical Coffllunadons 
R Applicadoti of Heuristic Rules 

R Dia^iosdc Catena Determinations 

G. CttnacalWei^iling Systems 

R Diaguosdc Transformations 

1. Clinical Ttansformadons 

J. AppHcadon of Expert Knowledge 
K. Labeling Techniques 
L. Applicauon of ofter Domain Knowledge 
M Bayesian Network iCnowkdge 

3. Optionally appiy mathematical (computational) preprocessing techniques to 
derive GeuqpiiZtea'^ 

This step actually creates an additionalnewaetofinput data (input vectors). 

This next step in die invencion involves die option of appHcadon of madicmacical 
(compuodonal) prc-processifig techniques to die law data to create an additional set 



of iq)ut data known as CawpuD^^. ExatDpks of mathematical 
(con^utational) pte-pxocessing sf^ indude but ace not linttied to the foDoving^ 

A. L&bding 

B. Binaiy Convezsion 

C. Ixsg^ddunic Tiaosfonnatlon 

D. Sine Ttansfonoalioa 
E Cosine Tiansfoffliatioo 

F. Tangent Tonsfoimxbon 

G. G)tu)gentTfansfeaiiaiion 
R Clustedng 

T. Summamanon 
J, Scaling 

K. Ptobabiltstic Analysis 

L. Sigoificance Testing 

M. Strength Testing 

N. Seaidifor2-DR^;qlaiides 

O- Identify Hqufvalence Rehlions 

R ApptyCoQticgencf Tables 

Q. App^GophlheocjPiindpks 

R. CxcateVectorinngM^ 

S. Multiplication 

T. Division 

U. Addition 

V. Subcoction 

W. Application of PolToomial Equations 

X. ApplicadonofBasic and Complex Statistics 

Y. Identify Proportionality's 

Z. Discdminatoty Power Dctcnnination 

AA. Apply CondMoations of the Above Sisuihaneously 

4. Combine the cesults of Af«fiZ3!flia~and CbinpuZhea^widi die onginal raw data to 

This 6tep flGmalfy ctcatcs ao addinoiialncwset of iiq)at data O^put vectors). 

This step of Ac invention combines the attributes (vcctois) of the taw d^ the 
MediD^ta^2Bd the CowpuDsta^to create an additional new set of input data (input 
vectors) caMed IntdBD^t^*^ to be fied into die Support Vector Machines for 
dimensional computation and mapping. 

5. Utilize the created IntdBDm^ as inpm ioio one or more rapport vector 
madimea. 

' I his step of the invention utilizes the otigjual taw data (vectors) aloQg widi die opim of 
using newly cteaied vectors MediD^^, CampvData^, and InielHDuisi^ tO assist 
inprovidingsmancr datt to the Siq)port Vector Machine to allow for better compujalxon 
and hi^ dimensional Ottpping in die creation of die optima] hypciplane algpatfain 

6. Generate an opdmaihypeipbfle classifier algondm^ 

This step of 4e tnvcnlion uses one or more Support Vector Machines to detcirmnc ihc 
opomalhypeiplaneclasaifieialeoddwa The kemal of die Support Vector MacHne can 



be a po tynombl kenwl, a ndial bia^ classifier kemal» a neural nctvock kemal, or aiy other 
type of kernal that .satisfies the Mercer Condictoa 

To consmicc the Optimal Hypetplane, one has to separate the vectoo of the tzaining set 
belonging to two different classes using the hyperplane with the smallest nqon of 
Goeffidents. The Si^iport Vector Machine tn^lemcats the fcQowtng idea: it maps the 
if^uc vectors into hi^ dimensional feature space through some non-hnear tmppin^ 
chosen a pricri In this space, an optimal sepaianng hypeiplaae is consmicted This 
Optimal Ujfexphnt Chssifier A^ondm separates the vanons classes of intecesc 

7» .^l^oiatbematicalpost-fioccssingiecluti^ 

This step of die invention talu:s the Optimal Hypexplane Classifier Algpzilhm and 
optionally apply post-processing techniques to create a CowpvDtsx™ ( a 
cotTf utational index) whidi can then be imeipreted by a humarL Exanylcs oF 
Post-processing steps tnchidc but are not limited to die following: 

A. Reference Range Deteoninations 

B. Scaling Techniques ^inear and non-linear) 
C Tonsformations^iniearandnoa-Knear} 
D. Probability Estimations 

oesmaH usb^ ibta pm-ptocesstig steps to crt^ic laidS^ whkhJs 
theo sutafyzedia bi^ Smcmhnstt apace udog one or mote sigyfQit vcetor antcbmcsf the^ 
tcsnhs of whkb oie then sabfeaed to post-pwcessing sups to mate m sing^ valued 
numedaU c/assiSer, CowpuDex™ (a coaputatkuul intkx), wbkb cgn tbca be eoB^ 
mtefpretedbyu hunm. 



